ollama部署deepseek, 多显卡负载均衡

ollama部署不再过多描述, 我这有两张4090显卡, ollama run之后查看显卡资源只有一张再用, 想要配置成两张负载均衡使用,需要在service文件中添加如下

Environment="CUDA_VISIBLE_DEVICES=0,1" 代表让ollama能识别到第几张显卡
Environment="OLLAMA_SCHED_SPREAD=1" 这几张卡均衡使用
Environment="OLLAMA_KEEP_ALIVE=-1" 模型一直加载, 不自动卸载
Environment="OLLAMA_HOST=0.0.0.0" 监听地址
Environment="OLLAMA_PORT=11434" 监听端口

(base) root@ys:~# systemctl cat ollama
# /etc/systemd/system/ollama.service
[Unit]
Description=Ollama Service
After=network-online.target

[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=/usr/local/anaconda3/bin:/usr/local/anaconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/>
Environment="CUDA_VISIBLE_DEVICES=0,1"
Environment="OLLAMA_SCHED_SPREAD=1"
Environment="OLLAMA_KEEP_ALIVE=-1"

[Install]
WantedBy=default.target